Improving Name Discrimination: A Language Salad Approach
نویسندگان
چکیده
This paper describes a method of discriminating ambiguous names that relies upon features found in corpora of a more abundant language. In particular, we discriminate ambiguous names in Bulgarian, Romanian, and Spanish corpora using information derived from much larger quantities of English data. We also mix together occurrences of the ambiguous name found in English with the occurrences of the name in the language in which we are trying to discriminate. We refer to this as a language salad, and find that it often results in even better performance than when only using English or the language itself as the source of information for discrimination.
منابع مشابه
Adaptation of TAAABLE to the CCC'2017 Mixology and Salad Challenges, Adaptation of the Cocktail Names
This paper presents the submission of the TAAABLE team to the 2017 Computer Cooking Contest. All challenges except the sandwich challenge are addressed. Online systems have been developed for the salad and mixology challenges by adapting previous successful CCC TAAABLE systems to the requirements of the 2017 challenges. However, this paper presents two main contributions. The first contribution...
متن کاملModified Goal Programming Approach for Improving the Discrimination Power and Weights Dispersion
Data envelopment analysis (DEA) is a technique based on linear programming (LP) to measure the relative efficiency of homogeneous units by considering inputs and outputs. The lack of discrimination among efficient decision making units (DMUs) and unrealistic input-outputs weights have been known as the drawback of DEA. In this paper the new scheme based on a goal programming data envelopment an...
متن کاملA Language Independent Approach for Name Categorization and Discrimination
We present a language independent approach for fine-grained categorization and discrimination of names on the basis of text semantic similarity information. The experiments are conducted for languages from the Romance (Spanish) and Slavonic (Bulgarian) language groups. Despite the fact that these languages have specific characteristics as word-order and grammar, the obtained results are encoura...
متن کاملEfficiency Analysis Based on Separating Hyperplanes for Improving Discrimination among DMUs
Data envelopment analysis (DEA) is a non-parametric method for evaluating the relative technical efficiency for each member of a set of peer decision making units (DMUs) with multiple inputs and multiple outputs. The original DEA models use positive input and output variables that are measured on a ratio scale, but these models do not apply to the variables in which interval scale data can appe...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کامل